People often compare the US to European countries. This is not a fair comparison as the population of the US is far greater than any European nation. We would be better off comparing US states to European countries. In this Jupyter Notebook I will create a few visualizations to highlight how US state populations are generally comparable to the populations of European countries.
Choropleths are geographic maps that color in deliniated areas based on a specified variable value. They are a great visualization technique that can often be easier to analyze than a classic bar chart. We will utilize choropleths to highlight population similarities between US states and European countries.
Plotly is in an open source Python plotting library (it is also a library for other programming languages). It can be used to create over 40 chart types. Plotly allows you to create interactive visualizations. The library also has a large user base. We will be utilizing Plotly to display the population similarities between US states and European countries.
import sys
!{sys.executable} -m pip install plotly
import pandas as pd
import numpy as np
import plotly.express as px
# websites where the data we will be working with originated.
# https://worldpopulationreview.com/states
# https://worldpopulationreview.com/country-rankings/countries-in-europe
# https://worldpopulationreview.com/country-rankings/country-codes
state_pop = pd.read_csv('assets/us_state_pop.csv')
state_pop = state_pop[['State', 'Pop']]
state_pop.rename(columns={'State':'state', 'Pop':'pop'}, inplace = True)
state_pop['state_code'] = ['CA', 'TX', 'FL', 'NY', 'PA', 'IL', 'OH', 'GA', 'NC', 'MI', 'NJ', 'VA', 'WA', 'AZ', 'MA', 'TN', 'IN', 'MO', 'MD', 'WI', 'CO', 'MN', 'SC', 'AL', 'LA', 'KY', 'OR', 'OK', 'CT', 'UT', 'IA', 'NV', 'AR', 'PR', 'MS', 'KS', 'NM', 'NE', 'ID', 'WV', 'HI', 'NH', 'ME', 'MT', 'RI', 'DE', 'SD', 'ND', 'AK', 'DC', 'VT', 'WY']
all_euro_pop = pd.read_csv('assets/all_euro_pop.csv')
country_codes = pd.read_csv('assets/country_codes.csv') # contains country codes
merged_df = country_codes.merge(all_euro_pop, how = 'left', on = ['country', 'country']) # matching country codes to countries
all_euro = merged_df.dropna()
all_euro = all_euro[['cca3', 'pop2020', 'country']]
all_euro['pop2020'] = all_euro['pop2020'] * 1000 # un-scaling the population count
all_euro = all_euro.drop(index=168) # dropping out Russia because it is part of both Europe and Asia
#choropleths for US
state_fig = px.choropleth(state_pop,
locations = 'state_code', #https://plotly.github.io/plotly.py-docs/generated/plotly.express.choropleth.html
color="pop", # color scale designator
hover_name="state", # hover bubble info, shown when you hover over the geographical area
locationmode = 'USA-states',
color_continuous_scale=px.colors.sequential.solar_r, #https://plotly.com/python/builtin-colorscales/
range_color=[0,85000000]) #https://plotly.com/python/colorscales/
state_fig.update_layout(
title_text = '2020 Population by US State',
geo_scope='usa') # set world map view location https://plotly.com/python/reference/layout/geo/#layout-geo-scope
#choropleths for Europe
country_fig = px.choropleth(all_euro,
locations = 'cca3', #https://plotly.github.io/plotly.py-docs/generated/plotly.express.choropleth.html
color="pop2020", # color scale designator
hover_name="country", # hover bubble info
color_continuous_scale=px.colors.sequential.solar_r, #https://plotly.com/python/builtin-colorscales/
range_color=[0,85000000]) #https://plotly.com/python/colorscales/
country_fig.update_layout(
title_text = '2020 Population by European Country',
geo_scope='europe')
country_fig.show()
Even the design of plotly's general choropleths highlights how US states are not often compared to European countries. The toolkit does not allow for values to be simultaneously assigned to US states and European countries. So I have to display the two choropleths above eachother.
state_fig.show()
country_fig.show()
This is a bar chart of US state populations and European country populations.
# change the df column titles so we can merge the country and state dfs. And add a column to distinguish countries and states
all_euro = all_euro.rename(columns = {'country' : 'area', 'cca3' : 'code', 'pop2020' : 'pop'})
all_euro['region'] = 'europe'
#Georgia display issue fix
all_euro.loc[all_euro['area'] == 'Georgia', 'area'] = 'Georgia_country'
state_pop = state_pop.rename(columns = {'state':'area', 'state_code' : 'code'})
state_pop['region'] = 'usa'
#Georgia display issue fix
state_pop.loc[state_pop['area'] == 'Georgia', 'area'] = 'Georgia_state'
combined_df = pd.concat([all_euro,state_pop]) # use concat to merge the country and state dfs
combined_df = combined_df.sort_values(by=['pop']) #sort by population size
#display the chart
fig = px.bar(combined_df, x='code', y='pop', hover_name='area', labels={'pop':'Population', 'code': 'Area'}, height=400) #https://plotly.com/python-api-reference/generated/plotly.express.bar.html
fig.show()
Seperates and colors the two dataframes within the same chart.
fig = px.bar(combined_df, x='code', y='pop', hover_name='area', color='region', labels={'pop':'Population', 'code': 'Area'}, height=400) # set the color
fig.show()
This will highlight how much larger the US population is than the population of countries in Europe.
euro_usa = all_euro
usa = {'code':'USA', 'pop':331002651, 'area':'United States of America', 'region':'usa'}
euro_usa = euro_usa.append(usa,ignore_index=True) # add a column for the USA to the all_euro data frame
euro_usa = euro_usa.sort_values(by=['pop'])
fig = px.bar(euro_usa, x='code', y='pop', hover_name='area', color='region', labels={'pop':'Population', 'code': 'Area'}, height=400)
fig.show()
Plotly does not allow you to merge bar charts seperated by color, so I created a bar chart in matplotlib just to highlight the population similarities.
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches #needed to insert a legend
%matplotlib inline
fig = plt.figure(figsize=(80,50))
plt.figure(figsize=(80,50))
x = combined_df['area']
y = combined_df['pop']
x_pos = [i for i, _ in enumerate(x)]
plt.xticks(x_pos, x, rotation = 90)
plt.tick_params(axis='both', which='major', labelsize=50)
plt.ylabel("Population in 10M",fontsize=100)
red_patch = mpatches.Patch(color='red', label='US state population') # creates a legend patch set to a specific color https://matplotlib.org/tutorials/intermediate/legend_guide.html
blue_patch = mpatches.Patch(color='blue', label='European country population')
plt.legend(handles=[red_patch, blue_patch],loc='upper left',prop={'size': 90}) # creates the legend and sets it to the upper left corner
chart = plt.bar(x,y)
usa = [8,12,13,14,15,16,17,18,21,22,23,24,25,27,30,33,35,36,37,38,39,41,42,43,47,48,49,50,52,56,58,59,60,61,62,63,65,66,67,68,71,75,80,82,83,85,86,87,91,92,93,95] # manually list US state bar indexes
for x in usa:
chart[x].set_color('r') # uses a for loop to set US state bars to red
plt.show()